An interactive account of bilingual lexical acquisition
Associating a word-form to its referential context (and more)
Challenging task: ambiguity, variability
Earliest evidence of word acquisition: 6 months of age (Jusczyk and Aslin 1995; Bergelson and Swingley 2012)
Bilingual word acquisition: more than one word-form per referent
gos → DOG ← perro
Vocabulary checklist: number/proportion of words checked by caregivers as Understands, and/or Says (e.g., CDI, Fenson et al. 1994)
| Understands | Understands & Says | |
|---|---|---|
| chair | [ x ] | [ ] |
| table | [ ] | [ ] |
| … | [ ] | [ x ] |
English-Spanish bilinguals: smaller English vocab. size compared to monolinguals, but similar total vocab. size (Hoff et al. 2012)
Mixed evidence on other language pairs: English-French, Catalan-Spanish, English-Dutch
Bilingual toddlers learning two typologically close languages: larger vocabulary sizes (Floccia et al. 2018)
Cognate: form-similar translation equivalents (TEs)
| Cognate | Non-cognate |
|---|---|
| [cat] /ˈgat-ˈgato/ | [dog] /ˈgos-ˈpe.ro/ |
Bilinguals acquire TEs from early steps of vocabulary growth (Bilson et al. 2015; Tsui et al. 2022)
Cognates are acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2022; Bosch and Ramon-Casas 2014)
Why would cognates be acquired earlier?
Lexical access is language non-selective:
Translation equivalents are co-activated, even in monolingual situations (e.g., Costa, Caramazza, and Sebastian-Galles 2000)
Dissociation between models of bilingual word processing and word acquisition
Word acquisition as a continuous process of lexical consolidation (Hidaka 2013; Mollica and Piantadosi 2017)
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\\\ \textbf{For simulations:}~ \lambda &= 50 \end{aligned}
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\ \\ \textbf{For simulations:}\\ \lambda &= 50 \\ \text{Threshold} &= 300 \end{aligned}
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\ \\ \textbf{For simulations:}\\ \lambda &= 50 \\ \text{Threshold} &= 300 \\ \\ \text{Age of Acquisition}_{ij} &= \text{Age}_{i~[\text{Threshold]}} \end{aligned}
| Catalan | Spanish | |
|---|---|---|
| Language exposure | 60% | 40% |
| Catalan | Spanish | |
|---|---|---|
| Language exposure | 60% | 40% |
Hypothesis: word-representations receive learning instances from their translations
Increment in learning instances: proportional to form-similarity (cognateness)
Cognate
Non-cognate
Including learning instances from parallel activation
Hypothesis: word-representations receive learning instances from their translations
Proportional to the amount of form-similarity (cognateness)
\begin{aligned} \textbf{Monolinguals:} \\ \text{Learning instances}_{ij} &= Age_i \cdot Frequency_j \end{aligned}
\begin{aligned} \textbf{Bilinguals:} \\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \cdot \text{Exposure}_i+ \\ &(\text{Cognateness}_j \cdot \text{Learning instances}_{ij'}) \end{aligned}
| Catalan | Spanish | |
|---|---|---|
| Language exposure | 60% | 40% |
Barcelona Vocabulary Questionnaire (BVQ)
Participants filled one of four versions of the questionnaire: - 500 items: 250 Catalan + 250 Spanish
Short-listed (nouns): 302 translation equivalents (TE)
138,078 item responses from 366 participants
| 1 time | 2 times | 3 times | 4 times |
|---|---|---|---|
| 312 | 42 | 8 | 4 |
Ordinal regression model: P(Understands), P(Says)
Multilevel: Crossed-random effects
Bayesian: probability of parameter values
P(\text{model} | \text{data}) \propto P(\text{data} | \text{model}) \times P(\text{model})
| Predictor | Example |
|---|---|
| Age | Months |
| Length | Number of phonemes |
| Exposure | Lexical frequency \times Language exposure |
| Cognateness | Levenshtein similarity between a word-form and its translation |
| Two-way and three-way interactions between age, exposure, and cognateness |
| Predictor | Estimate | 95% HDI | p(H0) |
|---|---|---|---|
| Intercepts | |||
| Comprehension and Production | 0.438 | [-0.5, 0.5] | 0.088 |
| Comprehension | 0.936 | [2.44, 0.95] | 0.000 |
| Slopes | |||
| Age (+1 SD, 4.87, months) | 0.405 | [1.43, 0.45] | 0.000 |
| Exposure (+1 SD, 1.81) | 0.233 | [0.8, 0.27] | 0.000 |
| Cognateness (+1 SD, 0.26) | 0.058 | [0.06, 0.1] | 0.037 |
| Length (+1 SD, 1.56 phonemes) | -0.062 | [-0.35, -0.04] | 0.000 |
| Age × Exposure | 0.071 | [0.16, 0.1] | 0.000 |
| Age × Cognateness | 0.014 | [0, 0.03] | 0.985 |
| Exposure × Cognateness | -0.057 | [-0.28, -0.05] | 0.000 |
| Age × Exposure × Cognateness | -0.018 | [-0.11, -0.01] | 0.975 |
Levenshtein distance: number of edits for two character strings to become identical
| Orthography | Phonology | String | |
|---|---|---|---|
| Catalan | porta | /ˈpɔɾ.tə/ | pɔɾtə |
| Spanish | puerta | /ˈpweɾ.ta/ | pweɾta |
1-\frac{lev(A, B)}{Max(length(A), length(B))}
| Catalan | Spanish | Levenshtein |
|---|---|---|
| porta (/ˈpɔɾ.tə/) | puerta (/ˈpweɾ.ta/) | 0.50 (3) |
| taula (/ˈtaw.lə/) | mesa* (/ˈmesa/) | 0.00 (5) |
| cotxe (/ˈkɔ.t͡ʃə/) | coche (/ˈkot͡ʃe/) | 0.40 (3) |
| … | … | … |
International Symposium of Psycholinguistics | Vitoria, 31st May, 2023